Problems and Approaches for Oriental Document Analysis

نویسنده

  • Jin Hyung Kim
چکیده

Machine understanding of hand,filled documents in China, Japan and Korea requires not only general solutions of document analysis but also ability to handle peculiarities of the Oriental languages. As expected, handwritten Chinese character recognition is the major task for it. In addition, Japanese Kana, Korean Hangul, Roman alphabet as well as numerals are targets of recognition. The main difficulties of Oriental character recognition originate from their large character sets. A practical system should be able to handle at least 5000 classes from possibly 50000 over classes. For Hangul, 11720 classes can be made in theory. The difficulty closely depends on writing styles. Oriental script is generally classified into regular, fluent, cursive style. Needless to say, cursive style is deformed most seriously and, therefore, most difficult to recognize. Regular style writing is often attacked successfully by feature matching and statistical analysis, while fluent style is now actively under investigation by stroke analyses and structural matching. Cursive style recognition is seldom found even in research papers. Since Chinese and Hangul characters are intrinsically hierarchical, often hierarchical analysis has been applied. A Hangul character, which corresponds a syllable, is formed by 2 to 5 basic graphemes, drawn from 24 classes, deploying them in two dimensional way. Recognizing component graphemes is, we believe, the viable approach to handle the large set of Hangul classes. Therefore, segmentation into graphemes, which is another difficult task, is the key for hierarchical recognition For robust recognition of fluent to cursive style, the following research directions are suggested.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Mapping of McGraw Cycle to RUP Methodology for Secure Software Developing

Designing a secure software is one of the major phases in developing a robust software. The McGraw life cycle, as one of the well-known software security development approaches, implements different touch points as a collection of software security practices. Each touch point includes explicit instructions for applying security in terms of design, coding, measurement, and maintenance of softwar...

متن کامل

Approaches to sustainable development of arid regions

Interest in arid ecosystems as a source of food supply for rapidly increasing population of the world, on the one hand, and the problems pertaining to the climatic characteristics, water shortages and soil sensitivity for the inhabitants in such areas, on the other have prompted efforts at both national and international levels to develop a better understanding of these problems as well as to f...

متن کامل

Computational investigation of ginsenoside F1 from Panax ginseng Meyer as p38 MAP Kinase Inhibitor: Molecular docking and dynamics simulations, ADMET analysis, and drug likeness prediction.

Ginsenoside F1 is a biologically active compound identified potential from Korean Panax ginseng Meyer. In the present study, the potential targets of ginsenoside F1 were investigated by computational target fishing approaches including ADMET prediction, biological activity prediction from chemical structure, molecular docking, and molecular dynamics methods. Results were suggested to express th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997